skip to main content


Search for: All records

Creators/Authors contains: "Wang, Peipei"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Switchgrass low-land ecotypes have significantly higher biomass but lower cold tolerance compared to up-land ecotypes. Understanding the molecular mechanisms underlying cold response, including the ones at transcriptional level, can contribute to improving tolerance of high-yield switchgrass under chilling and freezing environmental conditions. Here, by analyzing an existing switchgrass transcriptome dataset, the temporal cis- regulatory basis of switchgrass transcriptional response to cold is dissected computationally. We found that the number of cold-responsive genes and enriched Gene Ontology terms increased as duration of cold treatment increased from 30 min to 24 hours, suggesting an amplified response/cascading effect in cold-responsive gene expression. To identify genomic sequences likely important for regulating cold response, machine learning models predictive of cold response were established using k -mer sequences enriched in the genic and flanking regions of cold-responsive genes but not non-responsive genes. These k -mers, referred to as putative cis -regulatory elements (pCREs) are likely regulatory sequences of cold response in switchgrass. There are in total 655 pCREs where 54 are important in all cold treatment time points. Consistent with this, eight of 35 known cold-responsive CREs were similar to top-ranked pCREs in the models and only these eight were important for predicting temporal cold response. More importantly, most of the top-ranked pCREs were novel sequences in cold regulation. Our findings suggest additional sequence elements important for cold-responsive regulation previously not known that warrant further studies. 
    more » « less
  2. Abstract

    The geometric phase of an electronic wave function, also known as Berry phase, is the fundamental basis of the topological properties in solids. This phase can be tuned by modulating the band structure of a material, providing a way to drive a topological phase transition. However, despite significant efforts in designing and understanding topological materials, it remains still challenging to tune a given material across different topological phases while tracing the impact of the Berry phase on its quantum transport properties. Here, we report these two effects in a magnetotransport study of ZrTe5. By tuning the band structure with uniaxial strain, we use quantum oscillations to directly map a weak-to-strong topological insulator phase transition through a gapless Dirac semimetal phase. Moreover, we demonstrate the impact of the strain-tunable spin-dependent Berry phase on the Zeeman effect through the amplitude of the quantum oscillations. We show that such a spin-dependent Berry phase, largely neglected in solid-state systems, is critical in modeling quantum oscillations in Dirac bands of topological materials.

     
    more » « less
  3. null (Ed.)
    Abstract Background Availability of plant genome sequences has led to significant advances. However, with few exceptions, the great majority of existing genome assemblies are derived from short read sequencing technologies with highly uneven read coverages indicative of sequencing and assembly issues that could significantly impact any downstream analysis of plant genomes. In tomato for example, 0.6% (5.1 Mb) and 9.7% (79.6 Mb) of short-read based assembly had significantly higher and lower coverage compared to background, respectively. Results To understand what the causes may be for such uneven coverage, we first established machine learning models capable of predicting genomic regions with variable coverages and found that high coverage regions tend to have higher simple sequence repeat and tandem gene densities compared to background regions. To determine if the high coverage regions were misassembled, we examined a recently available tomato long-read based assembly and found that 27.8% (1.41 Mb) of high coverage regions were potentially misassembled of duplicate sequences, compared to 1.4% in background regions. In addition, using a predictive model that can distinguish correctly and incorrectly assembled high coverage regions, we found that misassembled, high coverage regions tend to be flanked by simple sequence repeats, pseudogenes, and transposon elements. Conclusions Our study provides insights on the causes of variable coverage regions and a quantitative assessment of factors contributing to plant genome misassembly when using short reads and the generality of these causes and factors should be tested further in other species. 
    more » « less
  4. Abstract

    Plants respond to wounding stress by changing gene expression patterns and inducing the production of hormones including jasmonic acid. This wounding transcriptional response activates specialized metabolism pathways such as the glucosinolate pathways in Arabidopsis thaliana. While the regulatory factors and sequences controlling a subset of wound-response genes are known, it remains unclear how wound response is regulated globally. Here, we how these responses are regulated by incorporating putative cis-regulatory elements, known transcription factor binding sites, in vitro DNA affinity purification sequencing, and DNase I hypersensitive sites to predict genes with different wound-response patterns using machine learning. We observed that regulatory sites and regions of open chromatin differed between genes upregulated at early and late wounding time-points as well as between genes induced by jasmonic acid and those not induced. Expanding on what we currently know, we identified cis-elements that improved model predictions of expression clusters over known binding sites. Using a combination of genome editing, in vitro DNA-binding assays, and transient expression assays using native and mutated cis-regulatory elements, we experimentally validated four of the predicted elements, three of which were not previously known to function in wound-response regulation. Our study provides a global model predictive of wound response and identifies new regulatory sequences important for wounding without requiring prior knowledge of the transcriptional regulators.

     
    more » « less
  5. null (Ed.)
  6. de Meaux, Juliette (Ed.)
    Abstract Genetic redundancy refers to a situation where an individual with a loss-of-function mutation in one gene (single mutant) does not show an apparent phenotype until one or more paralogs are also knocked out (double/higher-order mutant). Previous studies have identified some characteristics common among redundant gene pairs, but a predictive model of genetic redundancy incorporating a wide variety of features derived from accumulating omics and mutant phenotype data is yet to be established. In addition, the relative importance of these features for genetic redundancy remains largely unclear. Here, we establish machine learning models for predicting whether a gene pair is likely redundant or not in the model plant Arabidopsis thaliana based on six feature categories: functional annotations, evolutionary conservation including duplication patterns and mechanisms, epigenetic marks, protein properties including post-translational modifications, gene expression, and gene network properties. The definition of redundancy, data transformations, feature subsets, and machine learning algorithms used significantly affected model performance based on hold-out, testing phenotype data. Among the most important features in predicting gene pairs as redundant were having a paralog(s) from recent duplication events, annotation as a transcription factor, downregulation during stress conditions, and having similar expression patterns under stress conditions. We also explored the potential reasons underlying mispredictions and limitations of our studies. This genetic redundancy model sheds light on characteristics that may contribute to long-term maintenance of paralogs, and will ultimately allow for more targeted generation of functionally informative double mutants, advancing functional genomic studies. 
    more » « less